Measuring precision in legal term mining: a corpus-based validation of single and multi- word term recognition methods
نویسنده
چکیده
Legal terminology presents certain traits which may interfere with its automatic detection such as its relevant presence in everyday language. Thus, this research explores the levels of precision achieved by five single and multi-word term recognition methods on a pilot legal corpus of 2.6 million words. A comparison is carried out with the results presented by Marín (2014a). Once the most effective single and multi-word term recognition method is singled out, it is applied to the reference corpus, BLaRC, with the aim of producing a reliable list of legal terms which might be exploited in areas such as English for Specific Purposes (ESP) instruction, Applied Linguistics or Terminology. MOTS-CLÉS: anglais juridique; méthodes ATR; linguistique de corpus; terminologie
منابع مشابه
Prediction-Based Portfolio Optimization Model for Iran’s Oil Dependent Stocks Using Data Mining Methods
This study applied a prediction-based portfolio optimization model to explore the results of portfolio predicament in the Tehran Stock Exchange. To this aim, first, the data mining approach was used to predict the petroleum products and chemical industry using clustering stock market data. Then, some effective factors, such as crude oil price, exchange rate, global interest rate, gold price, an...
متن کاملAcronyms as an Integral Part of Multi-Word Term Recognition - A Token of Appreciation
Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain–specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multi–word term...
متن کاملMixture of Experts for Persian handwritten word recognition
This paper presents the results of Persian handwritten word recognition based on Mixture of Experts technique. In the basic form of ME the problem space is automatically divided into several subspaces for the experts, and the outputs of experts are combined by a gating network. In our proposed model, we used Mixture of Experts Multi Layered Perceptrons with Momentum term, in the classification ...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملRule-based Automatic Multi-word Term Extraction and Lemmatization
In this paper we present a rule-based method for multi-word term extraction that relies on extensive lexical resources in the form of electronic dictionaries and finite-state transducers for modelling various syntactic structures of multi-word terms. The same technology is used for lemmatization of extracted multi-word terms, which is unavoidable for highly inflected languages in order to pass ...
متن کامل